Extracting semistructured data from the Web: An XQuery Based Approach

نویسندگان

  • Gilles Nachouki
  • Mohamed Quafafou
چکیده

This paper describes work in progress concerning the extraction of information from the web. This work is a part of frameworks consisting to extract, interconnect and access heterogeneous data sources. In this paper, we present a new approach for information extraction from the web. In this approach the web is viewed as a large database containing XML documents. The XQuery language is used in order to extract information from this database. An experimental tool has been developed in order to validate this proposal.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Association Rules from XML Data using XQuery

In recent years XML has became very popular for representing semistructured data and a standard for data exchange over the web. Mining XML data from the web is becoming increasingly important. Several encouraging attempts at developing methods for mining XML data have been proposed. However, efficiency and simplicity are still a barrier for further development. Normally, pre-processing or post-...

متن کامل

High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences

Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...

متن کامل

The XML Query Language Xcerpt: Design Principles, Examples, and Semantics

Most query and transformation languages developed since the mid 90es for XML and semistructured data – e.g. XQuery [1], the precursors of XQuery [2], and XSLT [3] – build upon a path-oriented node selection: A node in a data item is specified in terms of a root-to-node path in the manner of the file selection languages of operating systems. Constructs inspired from the regular expression constr...

متن کامل

Query Algebra for Semistuctured Data

With the tremendous growth of World Wide Web (WWW) data, there is an emerging need for effective information retrieval at the document level. Several query languages such as XML-QL, XPath, XQL, Quilt and XQuery are proposed in recent years to provide faster way of querying XML data, but they still lack of generality and efficiency. Our approach towards evolving a framework for querying semistru...

متن کامل

Xcerpt and visXcerpt: From Pattern-Based to Visual Querying of XML and Semistructured Data

With the advent of XML as a format for data exchange and semistructured databases, query languages for XML and semistructured data have become increasingly popular. Many such query languages, like XPath and XQuery, are navigational in the sense that their variable binding paradigm requires the programmer to specify path navigations through the document (or data item). In contrast, some other la...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001